skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Mordahl, Austin"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Natural language processing (NLP) has gained widespread adoption in the development of real-world applications. However, the black-box nature of neural networks in NLP applications poses a challenge when evaluating their performance, let alone ensuring it. Recent research has proposed testing techniques to enhance the trustworthiness of NLP-based applications. However, most existing works use a single, aggregated metric (i.e., accuracy) which is difficult for users to assess NLP model performance on fine-grained aspects, such as LCs. To address this limitation, we present ALiCT, an automated testing technique for validating NLP applications based on their LCs. ALiCT takes user-specified LCs as inputs and produces diverse test suite with test oracles for each of given LC. We evaluate ALiCT on two widely adopted NLP tasks, sentiment analysis and hate speech detection, in terms of diversity, effectiveness, and consistency. Using Self-BLEU and syntactic diversity metrics, our findings reveal that ALiCT generates test cases that are 190% and 2213% more diverse in semantics and syntax, respectively, compared to those produced by state-of-the-art techniques. In addition, ALiCT is capable of producing a larger number of NLP model failures in 22 out of 25 LCs over the two NLP applications. 
    more » « less
  2. null (Ed.)
  3. Variability in C software is a useful tool, but critical bugs that only exist in certain configurations are easily missed by conventional debugging techniques. Even with a small number of features, the configuration space of configurable software is too large to analyze exhaustively. Variability-aware static analysis for bug detection is being developed, but remains at too early a stage to be fully usable in real-world C programs. In this work, we present a methodology of finding variability bugs by combining variability-oblivious bug detectors, static analysis of build processes, and dynamic feature interaction inference. We further present an empirical study in which we test our methodology on two highly configurable C programs. We found our methodology to be effective, finding 88 true bugs between the two programs, of which 64 were variability bugs. 
    more » « less
  4. Many critical software systems developed in C utilize compile-time configurability. The many possible configurations of this software make bug detection through static analysis difficult. While variability-aware static analyses have been developed, there remains a gap between those and state-of-the-art static bug detection tools. In order to collect data on how such tools may perform and to develop real-world benchmarks, we present a way to leverage configuration sampling, off-the-shelf “variability-oblivious” bug detectors, and automatic feature identification techniques to simulate a variability-aware analysis. We instantiate our approach using four popular static analysis tools on three highly configurable, real-world C projects, obtaining 36,061 warnings, 80% of which are variability warnings. We analyze the warnings we collect from these experiments, finding that most results are variability warnings of a variety of kinds such as NULL dereference. We then manually investigate these warnings to produce a benchmark of 77 confirmed true bugs (52 of which are variability bugs) useful for future development of variability-aware analyses. 
    more » « less